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ABSTRACT 



A non-technical discussion and the general technical formulation 
of a statistical decision problem are given. Following this, statistical 
decision theory is used to solve a testing problem concerning a proto- 
type midget submarine. A set of rules to be followed in conducting the 
testing and reaching an optimum decision as to whether to accept the 
midget is developed. The development proceeds according "to the 
Bayes solution of a statistical decision problem in which the stochastic 
variables are independently and identically distributed and limited to 
take only two values. Finally, brief discussions of the assumptions 
and restrictions of statistical decision theory and the role of the Mini- 
max solution are included. 
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PREFACE 



Operations research in the Navy is concerned with the establish- 
ment of quantitative basis for command decision. To help achieve this, 
the naval operations analyst is constantly seeking more useful tools. 

One such tool is the new theory of statistical decision functions, which, 
though presently unexploited in application, holds promise cf extensive 
future use. 

The theory of statistical decision functions was developed in the 
decade prior to 1950 by the late Abraham Wald. The development cul- 
minated in the publication of his definitive book Statistical Decision 
Functions . The book was written for mathematicians, and is too 
cryptic for the reader of limited mathematical background. This fact, 
along with the writer's belief that statistical decision theory can be of 
practical value to the naval operations analyst, prompted the present 
thesis. The thesis is intended as an introduction to the subject, and," 
except for Chapter 1 which is non-mathematical, is directed toward 
the reader who has studied calculus and has completed an elementary 
course in probability and statistics. The thesis purports to do no more 
than present the most essential elements of statistical decision theory 
and the detailed solution of a simple special case. The reader inter- 
ested in a more mature treatment is referred to Wald. 

Source material for the paper has consisted primarily of Wald's 
book and notes taken by the author during a course of instruction in 
statistical decision functions given by Professor Thomas E. Oberbeck 
at the United States Naval Postgraduate School. The contents are 
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arranged in five chapters and an appendix. Chapter I is a non-technical 
discussion of a type of practical problem that may be solved by statisti- 
cal decision theory. The technical treatment begins in Chapter II, where 
the general formulation of the Bayes solution of the statistical decision 
problem is presented. Chapter III introduces certain assumptions 
needed to apply the theory, and Chapter IV treats an elementary special 
case. Chapter V deals with the Minimax solution. The Appendix gives 
a review of some selected mathematical concepts needed to understand 
better the technical discussions. It is recommended that the reader 
study the Appendix before beginning Chapter II. 

The thesis was written during the period January - June , 1955 
at the U. S. Naval Postgraduate School, Monterey, California. I wish 
to express my gratitude to the Navy for affording me the opportunity to 
write the thesis, to Professor Thomas E. Oberbeck for the technical 
competence and contagious enthusiasm he brought to his task as faculty 
advisor, to Professor Walter Jennings for helpful suggestions made 
while serving as second reader, and to Mrs. D. P. Slingerland for 
painstaking clerical assistance. 
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CHAPTER I 



A NON-TECHNICAL DISCUSSION 



1. Introduction. 

In naval planning it is often necessary to predict the future use- 
fulness of a proposed weapon or tactic. To do this, some value asso- 
ciated with the weapon or tactic, such as percentage success, average 
missed distance, or average life, is selected as a measure of the use- 
fulness of the weapon or tactic. The problem then becomes one of es- 
timating what this value, which we shall refer to as a parameter value, 
would be in a future war. 

The usual procedure for doing this is to conduct some trials. An 
estimate of what the parameter value would be in a future war is obtain- 
ed as a result of these trials. The important thing to note is that the 
estimate is not guaranteed to be correct. We intuitively suspect that 
the accuracy of the estimate increases as the number of trials conduct- 
ed increases. Hence, the number of trials to be conducted is of funda- 
mental importance. 

The question of how many trials to conduct is often decided arbi- 
trarily. Again, if the services of a statistician are available, the 
naval planner may determine the number of trials required to give, on 
the average, an arbitrarily specified degree of confidence in the esti- 
mate. In either case, some arbitrariness is retained. 

Statistical decision theory adds a refinement to this procedure. 

It employs a criterion based on probability theory to select an opti- 
mum number of trials. The process involves a s'ort of cost analysis 
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of the problem. In practical situations the cost of conducting trials 
will usually be significant, and a definite cogt may be associated with 
a poor estimate of the parameter value. To avoid the cost of the trials 
the planner is led to conduct no trials^ or only a few; to avoid the cost 
of a poor estimate he is led to conduct a great number of trials. Obvi- 
ously, the two considerations are opposed. The purpose of statistical 
decision theory is to reconcile these two opposing considerations, and, 
by the use of the criterion, to arrive at an optimum plan concerning 
the number of trials to be conducted and the final decision to be reached. 
Let us consider an example to see what this means. 

2. Exhibit A. ^ 

Suppose the Navy is interested in a newly developed midget sub- 
marine to be launched from a mother submarine and used to kill enemy 
submarines. The question of detection is not under consideration, but 
merely the capability of the midget to effect kills. It has been decided 
that the device should be tested. Budgetary considerations, consider- 
ations of priority of the services of the testing agency, etc. dictate the 
necessity of answering the question: How many trials are likely to be 
conducted? The question may be answered by using statistical deci- 
sion theory. But before giving the answer, it is necessary to establish 
some precepts to be used in reaching it. The technical meaning of 
these precepts will be seen later, when we discuss each as a datum of 
the statistical decision problem. For the moment, let us think of them 
merely as the ingredients of a recipe. They must be put into the problem 
if we are to obtain a solution. 

^ The example is entirely fictitious (hence unclassified), and has been 

chosen merely for illustration. , 



There are five precepts, and they are straightforward. First, we 
decide to classify each trial of the midget submarine as a success or a 
failure, accordingly as the midget succeeds or fails to achieve a kill on 
the trial. Then, in our problem, the percentage success of the midget 
submarine in a future war is the unknown parameter value described in 
the Introduction. Second, we must say something about the relative 
likelihood of the various possible parameter values, i. e. , the possible 
values of the percentage success of the midget submarine in a future 
war. Since we have no knowledge to the contrary, we assume that all 
values in the range 0 to 100% are equally likely to occur. Third, we 
decide to accept the midget if it succeeds on fifty percent or more of 
its trials, and to reject it if it does not. This decision might be based, 

f 

for e'srample, upon an assumption that present anti-submarine attack 
methods will succeed from twenty-five to seventy-five percent of the 
time in a future war. Fourth, we assume that the cost of each trial 
will be the same, and a study of the tactical situation, forces involved, 
etc. fixes the amount at $4000 per trial. Fifth, we have to establish 
the cost of a wrong decision as to whether the midget is superior to 
present anti-submarine attack methods. We may do this by making 
a'careful study. The study might consider such things as the cost of 
producing the midget, the number that would be produced, the cost 
of alternative weapons, etc. , and it leads us to an estimate of the 
cost of a wrong decision as shown in the following table: 

(See following page for table). 
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Decision 


Percentage success of 
midget in future war 


Cost in 
Dollars 


Accept midget (trial 
successes greater 
than 50%) 


more than 25% 


0. 


Accept midget (trial 
successes greater 
than 50%) 


less than 25% 


1, 000, 000. 


Reject midget (trial 
successes less 
than 50%) 


more than 75% 


1,000,000. 


Reject midget (trial 
successes less 
than 50%) 


less than 75% 


0. 



Cost of Decision 
Table 1. 



Note that the first and last lines of the table represent correct deci- 
sions and cost nothing, whereas the second and third lines represent 
wrong decisions and cost a definite amount. 

Let us elaborate upon the nature of these "costs” of wrong deci- 
sion. They are not actual amounts of money that must be paid to 
someone. Rather, they may be explained as follows: If a wrong de- 
sion is madej a certain disadvantage accrues to the Navy as a result. 
The money evaluation of this disadvantage is called the cost of wrong 
decision. This is much like the "cost" to a salesman who loses a 
$300 commission because he elects to play golf instead of seeing a 
prospective customer. He doesn't have to pay anyone the $300, but 
he is nonetheless $300 worse off than he might have been. We say 
he has made a "costly decision". In short, the cost of wrong decision 
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is the money equivalent of the loss suffered as a result of the wrong 
decision. 

In considering the costs shown in Table 1 it is necessary to keep 
in mind the distinction between the percentage success observed in the 
trials and the percentage success the midget submarine would have in 
a future war. The first is an estimate of the second, and is not neces- 
sarily correct. ;Nbt«::that. if the midget is definitely superior to pre- 
sent anti-submarine attack methods (i. e. , would have a percentage 
success in a future war in excess of 75%) and we reject it, we must 
penalize ourselves $1,000,000. This is the cost shown in the third 
line of the table. Similarly, if the midget is definitely inferior to 
present anti-submarine attack methods (would have a percentage 
success in a future war of less than 25%) and we accept it, we must 
again penalize ourselves $1,000,000. This is the cost shown in the 
second line of the table. On the other hand, if the percentage success 
of the midget in a future war is between 25% and 75%, we do not need 
to penalize ourselves for either decision. This is not unreasonable, 
in view of our earlier assumption that present anti-submarine attack 
methods have a percentage success of 25% to 75%. For, if the mid- 
get would have a percentage success in the same range, we shall 
consider that we have really neither gained nor lost by either accept- 
ing or rejecting it. 

The solution to the problem can now be given. It takes the form 
of a table (Table 2). The question of how such a table is obtained is, 
essentially, the subject of this paper. The actual detailed procedure 
for obtaining this particular table is presented in Chapter IV. For 
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the moment, let us accept the table. We can then examine and inter- 
pret it, so that we may gain an appreciation of the role of statistical 
decision theory. 

(See following page for Table 2. ) 

Let us note the construction of the Table. The numbers identi- 
fying the rows and columns designate, respectively, the number of 
failures and successes of the midget submarine that have been obser- 
ved in successive trials. Any set of one row designator and one col- 
umn designator locates a square in the table. This square then applies 
to the situation existing after the indicated number of failures and suc- 
cesses have been observed. For example, the square in row number 
three and column number five applies- after three failures and five 
successes have been observed. Each square contains two numbers 
(of dollars). They have the following meanings: 

ifpper number: the anticipated cost (in dollars) to the Navy 

A 

if no further trials are conducted, and a 
decision is made to accept or reject the 
midget on the basis of the trials conducted 
thus far. 

lower number: the anticipated cost (in dollars) to the Navy 

if trials are continued, and a final decision 
to accept or reject the midget is based on 
the results of further trials. 

The choice of the words "anticipated cost" in defining these 
two numbers has been carefully made. This is because the dollar 
values represented by these numbers in the table are "expected 
values" in the sense of probability theory. This is dhscussed in 

I 

the Appendix. What the reader must understand at this point is 
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Solution of Exhibit A 



that the numbers are not absolute like the $4000 cost per trial and the 
$1,000,000 cost of wrong decision. Rather, they are values calculated 
on the basis of the likelihoods of occurrence of the possible outcomes, 
much like insurance companies calculate the life expectancy of man 
from the relative frequency of deaths at each age. 

The anticipated costs shown in Table 2 constitute the criterion 
used to determine an optimum solution to the problem. Hence, the 
solution is optimum relative to these costs as a criterion. Since 
the nature of the criterion is probabilistic, the final decision, also, is 
probabilistic. What this means, in practical terms, is that, if a rare 
and unlikely series of results^is obtained on the conducted trials, such 
as success on every trial when actual future wartime employment will 
yield a preponderance of failures, a poor decision will be made. This 
is a chance that must be taken to avoid the great cost that would cer- 
tainly occur if a very large number of trials were conducted. It does 
not invalidate the theory any more than the survival of one individual 
to age 106 invalidates the methods of insurance companies. 

We may now proceed with the interpretation of the table. 

Notice that the upper number is greater thsm the lower number in some 
of the squares, and equal to it in others. Those in which it is greater 
are enclosed within the double lines. At any stage of testings corres- 
pa^ing to one of these enclosed squares, the anticipated cost is less 
to continue taking trials than it is to reach a decision at that time. On 
the other hand, at any stage of testing corresponding to a square out- 
side the double lines, the anticipated cost is as little if trials are 
halted, and a decision to accept or reject the midget submarine is made 
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on the basis of the sample already taken. 

Now observe that the (0,0) position is within the double lines. 
This means that the initial anticipated cost is least if some trials are 
conducted. The number that will be conducted depends on the outcome 
of the trials. We begin in the (0,0) position, and conduct a trial. If 
it succeeds, we move right to the (0,1) position; if it fails, we move 
down to the (1,0) position. In either case, the second position is still 
within the double lines, so another trial is conducted. This process 
is continued until a position outside the double lines is reached. This 
may require anywhere from three to 13 trials. For example, if the 
first three trials all succeed, position (0, 3) will be reached. Here, 
the upper entry ceases to be greater than the lower entry, so it will 
pay to stop taking trials and decide, since the percentage success of 
the trials conducted of 100% is greater than 50%, to accept the midget, 
^s another example, suppose the trial outcomes alternate from suc- 
cess to failure to success, etc. , in that order. This will result in 
a stair stepping down the table, returning to the main diagonal 
(number of successes equal number of failures) on alternate trials. 
Eventually we must arrive outside the double lines in position (6, 7) 
after 13 trials. The percentage success for the trials conducted is 
then 

' ^ j X 100 — 53. 8% » 

and again the decision is made to accept the midget. If the sequen- 
ce of outcomes leads to a position outside the double lines on the 
upper side of the main diagonal, the percentage success of the trials 
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conducted will be greater than 50%, and the midget will be accepted; if 
it leads to a position outside the double lines on the lower side of the main 
diagonal, the percentage success of the trials conducted will be less than 
50%, and the midget will be rejected. 

With the aid of Table 2, it is now possible to answer the 

earlier question of how many trials are likely to be conducted. The 

answer consists of Table 2 and the following rule: 

Begin conducting trials, and. following each trial,, 
note the position reached in the table. Continue 
this until a position outside the double lines is 
reached, then accept the midget if the number of 
successes exceeds the number of failures. Reject 
the midget if the number of failures exceeds the 
number of successes. The minimum number of 
trials required to reach a final decision will be 
three; the maximum number will be 13. 

3. Another Aspect. 

A direct solution of the problem has been given. Let us now 
consider a possible budgetary complication. Suppose that $32,000 
has been alloted to conduct the testing of the midget submarine. This 
is, of course, an illogical amount in the light of statistical decision 
theory. The solution does not divulge exactly how much the testing 
will cost. It predicts only that from three to 13 trials will be required. 

At $4000 per trial, this amounts to a cost of from $12, 000 to $52, 000. 

The dilemna can only be resolved, in the light of statistical decision 
theory, by getting the allotment changed to permit the flexibility 
required by the solution. Failing this, an optimad decision may be 
reached, but cannot be guaranteed. If it turns out that a position 
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outside the double lines, such as (5,3), is reached within eight trials, an 
optimum decision will be reached in spite of the limitation. On the other 
hand, if we are still inside the double lines alter eight trials, such as in 
position (4,4), sufficient data to indicate an optimum decision has not been 
collected. 

A variation of the budget problem is the case in which more than 
$52,000 is available for the testing. In such a case, expenditure be- 
yond $52,000 is, according to statistical decision theory, a waste of 
funds. The solution will have indicated the optimum decision adter 13 
trials, if not before, and additional trials are not called for by the 
theory. 

4. Summary. 

Exhibit A has been studied to help provide a conceptual under- 
standing of what is involved in the type of solution of the testing problem 
provided by statistical decision theory. It should be remembered that 
the precepts, i. e. , the decision to classify each trial as a success or 
a failure, the decision to either accept or reject the midget, the speci- 
fication of the cost of testing, the specification of the cost of wrong de- 
cision, and the specification of the likelihood of various values of the 
percentage success in a future war, are necessary inputs to the pro- 
blem. Finally, the solution that is obtained is optimum relative to 
the anticipated costs as the criterion, and the final decision is proba- 
bilistic in nature. 
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CHAPTER II 



GENERAL FORMULATION OF THE BAYES SOLUTION 
1. Basis of the Problem. 

A datum of any problem is defined to be something, actual or 
assumed, that is used as a basis for reckoning. The statistical de- 
cision problem has five of these. In Exhibit A we considered them 
intuitively as precepts. Let us now examine them in more technical 
detail, and introduce a portion of the notation of statistical decision 
theory. 

a. Stochastic Process X; A |tochastic process is defined as a 
countable collection of stochastic (chance) variables having a joint 
cumulative probability distribution. To explore this, let us think 
of a countable collection of stochastic variables 

X ={Xj} = {Xj, X^, X3, j.. 

Let us next think of a countable set of real values, one for each 
stochastic variable, i. e. , 




By definition (see Appendix), the joint cumulative probability distri- 
bution of all the stochastic variables in the countable collection is 
the probability that X^ < x^ simultaneously for all i . In other 
words, F(x) is the probability that X^ < Xj^, X^ S ^2 ' ^3 ^ ^3 ’ 
simultaneously. 

An important special case of a stochastic process should be 
mentioned. It is the case where the stochastic variables X^ , X2 , 
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are independently and identically distributed. .The condition 

of independence means that the joint distribution function is the product 
of the individual distribution functions. In this case» 

F(x) = Gj (xj) G2 (X2) G3 (X3) = G. (x.) . 

th 

where G. {x.^) is the distribution function of the i *' stochastic var- 
iable. The condition that the stochastic variables be identically dis- 
tributed means that the distribution of each stochastic variable has, 
not only the same form (such as normal or uniform), but also the 
same parameter values. Thus, we might have 

Gj (x.). = G (xj;^,(T) for all i 

where G (x;yU,<T) denotes a normal distribution with mean yW and 

standard deviation (T . In this case, we may write 

eo 

F (x) = F(x;/*,(T) = G(x.;/i,<r) 

showing the dependence of F upon the values of the parameters,/* 
and (T . 

The stochastic process of Exhibit A is an example of one in 
which the stochastic variables, X. , are assumed to be indepen- 
dently and identically distributed. The outcome of each trial of the 

midget submarine is considered to be a stochastic variable. Hence, 

th 

the result of the i trial constitutes the stochastic variable X. . 

The possible particular outcomes of each trial, success or failure, 
are thought of as representing particular values of the stochastic 
variables. The assumption that every trial has the same chance of 
succeeding as every other trial is equivalent to the assumption that 
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the stochastic variables are identically distributed. The two values to 
which the stochastic variables are restricted (failure and success) are 
denoted by 0 and 1 respectively. The common percentage success 
of each stochastic variable, thought of as a parameter, is labeled p 
(parameter value) . This makes it possible to depict the stochastic 
process diagramatically by showing the distribution, G (x;p) , of one 
of the identically distributed stochastic variables as in Figure 1. 



9 (>^) 





o 1 

Step Function 

Distribution of One of the Stochastic Variables of Exhibit A 

Figure 1. 



In this case, we may write 

F (x) = F (x;p) = ^ G (x.;p) 
showing the dependence of F upon the parameter p, 
b. Space jTL : The space -Ti. is defined to be a class of joint cumu- 
lative probability distribution functions known to contain the true dis- 
tribution, F (x) , of the stochastic process. The elements of -TL 
are joint cumulative probability distribution functions and differ from 
one another only in the values of their parameters. Hence, F (x;pj) , 
F (x;p 2 ) F (x;p^) , . . . are elements of the space -A. 
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when F depends on a single parameter. For this reason, it is often 
convenient, as well as illuminating, to think of J\ as a parameter space. 
Adopting this view in subsequent portions of this paper, we shall refer 
to elements of the space -A. as values of this parameter. The para- 
meter is then regarded as a stochastic variable, P , and, as such, 
is liable to take on different values with different likelihoods. Note 
that, following convention, we denote the parameter in its role as a 
stochastic variable by using the capital letter P , while parameter 
values are denoted by the small p . In short, -A- is a class of simi- 
lar joint cumulative probability distribution functions having different 
parameter values and known to contain, as an element, the particular 
joint cumulative probability distribution function having the correct 
parameter value, or, as we have referred to it above, the true F . 

To determine an optimum way of estimating this parameter value is 
the crux of the statistical decision problem. 

In Exhibit A, the percentage success of the midget submarine 
in a future war is the particular parameter value of interest. If 
we knew it, there would be no problem. Since we do not, we regard 
the unknown parameter as a stochastic variable, P . We know only 
that this stochastic variable is confined to range between 0 and 
100%. It was stated, as a precept, that prior to any experimenta- 
tion we would assume the true parameter value to be anywhere in 
this range with equal likelihood. This is equivalent to saying that 
the stochastic variable, P , is continous and that its a priori dis- 
tribution is uniform. The uniform probability density function of 
P , ^ (p) , and the associated cumulative probability distribution 
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function, (p) , are shown in Figure 2. 







A Priori Distribution of the Parameter of Exhibit A 

Figure 2. 

Note that p represents a possible value of the stochastic variable 
P , and ^ (p) represents the probability that P < p . 

c. Space : The space is defined to be the space of possi- 

ble final decisions. To illustrate D^ , let us again refer to Exhibit 
A. We recall that, at any stage of experimentation, we were always 
faced with two alternative types of decisions, namely, to make a 
final decision or to continue experimenting. These two types of 
decisions are distinguished by defining two classes of decisions: 

: the class of all terminal decisions 

D : the class of all decisions to continue experimenting, 

such as take one more trial or take two more stages 
of three trials each, etc. 

Now, in Exhibit A, consisted of two elements: 

d j : accept the midget. 

d^ : reject the midget. 

D® consisted of a single element: 
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dj^ : take one more trial, 

t c 

In general^ D and D are not so restricted, but may consist of as 

many elements as needed to cover all possible decisions. This idea 

is expressed symbolically by; 

is a class consisting of d j , d^ 

0 0 0 
D is a class consisting of d^ , d^ 

t 0 

To illustrate the relation between D and D , it is convenient to 
define the class D as the class of all possible decisions. It is then 
clear that D = D^U D® . This is shown pictorially in Figure 3. 



D 




Figure 3 

It will be recognized that the sum total of all decisions from and 
D are exhaustive and mutually exclusive. 

d. Weight Function W (p,d^) : The weight function is defined to 

be a non-negative function, the value of which expresses the cost of 
making the terminal decision d^ when the true parameter value is 
p . It is through the weight function that the cost of making a wrong 
decision is introduced into the problem. If a correct decision is 
made, the value of W (p,d^) will be zero; if an incorrect decision 
is made, W (p,d^) may have a positive value. In general, the 
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weight function, like any function of two variables, may be depicted as 
a surface as shown in Figure 4. 




Weight Function ( General ) 

Figure 4. 

In the special case of Exhibit A, this surface degenerates into two 
curves in space as follows: 




Weight Function for Exhibit A 
Figure 5. 
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Since there are only two elements in for Exhibit A, it becomes more 
instructive to represent Figure 5 as two curves as shown in Figure 6. 




Alternative Representation of Weight Function for Exhibit A 

Figure 6. 

Either Figure 5 or Figure 6 is the equivalent of Table 1. 

The weight function is the most difficult datum of the statisti- 
cal decision problem to specify. Since it is a datum, it must be 
^kiipwn before a statistical decision problem can be solved. The 
operations analyst must be able to specify its value for any vj^ues 
of the arguments p and d* . This amounts to saying that he must 
be able to assign a numerical cost to any combination of a possible 
terminal decision and a possible parameter value. The question of 
how to do this is one that needs extensive investigation, and offers 
an opportunity for further study. 

It is often possible and desirable to classify decisions as 
merely right or wrong. In such case, the weight function, W (p,d^) , 
taJces on only values 1 and 0 , and is said to be a simple weight 
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, this was the case in 



function. Except for a scaling factor of 10^ 

Exhibit A. 

Cost Function C (x, s) : The cost function is defined to be a non 

negative function expressing the cost of experimentation. In general, 
it depends on the values, x = Xj, x^, . . . , obtained on the observa 
tions. It also depends on the variables observed in each stage of ex- 
perimentation, and the number of stages, s = s^, s^, . . . , 

observed. However, it may be possible, and is usually desirable, 
to consider the special case in which the cost of experimentation is 
the same for each experiment. Then the total cost of experimenta- 
tion is proportional to the number of trials conducted. This was the 
case in Exhibit A where each observation cost $4000, and the cost 
function had a value of 4000 times the number of observations tadcen. 

2. The Statistical Decision Function, (x, s) . 

A statistical decision function, cT , is a set of rules which 
estimates a parameter using the results of observations of a sto- 
chastic process X. It depends on the values x = x^ , x^ . . . 
obtained on the observations and on the variables observed in each 
stage of experimentation as well as the number of stages, s = Sj , 

®2 ' * function which prescribes a plan for conducting 

experimentation and reaching a terminal decision. For example, 
in Exhibit A the statistical decision function consisted of the Table 
2| from which instructions for experimenting and reaching a termin- 
al decision were obtained. The problem of statistical decision theory 
is, given the stochastic process X , the space -A. , the space , 
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the weight function W (p, d^) and the cost function C (x, s) , to find the 
statistical decision function that provides the optimum decision. 

3. The Risk Function, r ( p,<^). 

Each statistical decision function S is an element of the class 
of all statistical decision functions. To select that cf from JB which 
provides the optimum solution to a statistical decision problem, a 
criterion is needed. That is the role of the risk function. We have 
already seen, from Exhibit A, that the criterion must take account 
of the conflicting costs of experimentation and wrong decision. To 
introduce these costs more precisely into the risk function, let us 
define 

(p»«f ) ’ the expected cost of decision [ expected value of 
* t 

W (p, d ) ] when p is true and is used. 

^2 expected cost of experimentation [ expected 

value of C (x, s) ] when p is true and J is 
used. 

Note that r^ (p,^) and r 2 (p»^) are both expected values. The 
meaning of "expected value" has been discussed briefly in connect- 
ion with the anticipated costs of Exhibit A; it is explained more tech- 
nically in the Appendix. Now, rj (p, cl) and r^ (p* </) are, respec- 
tively, the expected (average) values of W (p,d^) and C (x, s) for 
given values of p and cT . That is, W (p,d^) is averaged by the 
probability that d will be made to give r^ (p, J) , and C (x, s) 
is averaged by the probability that the values x will be obtained 
when the stages, s = Sj , s^ » . . . . Sj^ , are observed to give 
r^ (p* <i) . The notion that these averages are obtained for a parti- 
cular (p, </) should be kept clearly in mind, for we shall subsequently 
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require expected values calculated with respect to the variables p and cT. 
The risk function may now be defined to be the sum of the expected values 
of the weight function and the cost function for given values of p and cf . 
That is, 

r(p,cf) = rj(p,c^) + r2(p,<^). 

Hence, the risk function, which may take on a value for any pair of ar- 
guments (p, c^) I represents the total expected cost associated with these 
arguments. 

4. The Bayes Solution. 

The goal of statistical decision theory is to select the particular 
statistical decision function, cC, that prescribes the optimum plan con- 
cerning the number of trials to be conducted and the optimum terminal de- 
cision based on the results of these trials. The risk function is the basic (. 
criterion to be used in malting this selection. But the risk function, as we 
have seen, depends on both p andc^^for its value. The dependence on p 
makes it unsuitable, in its present form, as a yardstick for comparing the 
relative merits of various c/. To overcome this difficulty, we need to re- 
move the dependency on p . This is accomplished by averaging out the p, 
leaving a new function, the average risk, which depends on alone for its 
value. The values of the new function may be ordered as to magnitude, 
and the magnitudes will vary with cTalone. 

, Let us elaborate on this. It often happens that a reasonable estimate 
of the likelihood of P taking on various values, p , can be given at the out- 
set. That is, the physics of the problem, a study of past results, or even 
a shrewd analysis may provide an a priori distribution of iP. This simply 
means that we are able to specify, either as an assumption or 
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as a reasonable approximation, some distribution function, §(p), that 

describes the likelihood with which P will take on the values within its 

range of possible values. From this point on, 5 is assumed to be 
2 

known. If we now take the expected value of the risk function with 
respect to the a priori distribution of P , we get, in Wald's notation, 

r(S.ti) = J r 
-a 

Notice that this average risk, averaged with respect to the a priori 
knowledge of P , depends only on 5 and cJ * and 5 is known. 

This is a significant result. It means that the average risk is suitable 
as the yardstick for comparing cT, since it can be ordered as to mag- 
nitude, and the magnitudes depend only on (/ . Our interest, of 
course, is in selecting a particular that makes the average risk 

the least. That is, we want a such that 

r(f, cC) = Min r(?,<T). 
o cf 

This is often alternatively expressed as 
r (%, <T^) < r ) for a.ll<f. 

Such a constitutes a Bayes solution . Thus, a Bayes solution is a 

^ ^ which minimizes the average risk, r(5,^) , with respect to 
all d . It is to be noted that a Bayes solution is a solution relative 
to a particular a priori distribution ^ . The procedure employed 
to arrive at a Bayes solution is summarized in the block diagram of 
Figure 7. 



The case where ? cannot be specified is discussed in Chapter V. 
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Block Diagram of the Buildup to a Bayes Solution 

Figure 7. 
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CHAPTER III 



ASSUMPTIONS OF STATISTICAL DECISION THEORY 

1. An Assumption Concerning Each Datum. 

This chapter introduces some assumptions applied to the theory 
of; statistical decision functions to insure that solutions exist, A 
complete study of the implications of these assumptions is not at- 
tempted. Rather, the assumptions are briefly presented here merely 
to acquaint the reader with the nature of the problem, so that he may 

I 

gain some insight into the character of the restrictions imposed by 
the assumptions. A full treatment is given by Wald. One assump- 
tion' regarding each datum of the statistical decision problem is re- 
quired. 

a. ' Assumption T r The assumption regarding the stochastic process 
X is stated only for the case where the X. are independently and 
identically distributed. In this case, it is assumed that the stochas- 
tic process, X , is discrete or absolutely continuous. That is, 
either each component stochastic variable is discrete, or it is con- 
tinuous and has a density function. Continuous stochastic variables 
•without density functions are not admitted. 

K. • Assumption 2 ; A convergence property regarding the space -TL 
Is required. However, it is not necessary to explore the nature of 
this" property for our purposes, since Wald shows that it is a conse- 
quence of Assumption 1. As such, it constitutes no additional limi- 
tation. 

c. Assumption 3; The weight function, W (p, d^) , is a bounded 
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function of p and . Recalling that the weight function was defined 
to be a non-negative function which describes the cost of making any 
particular terminal decision, d^ , we fcee that this assumption merely 
excludes the possibility of any decision costing an infinite amount. 

d. Assumption 4 ; The space is compact in the sense of the 
metric 

R (d‘ . d2> = I W (p. d* ) - W (p. <1*2^. 

This assumption is fulfilled if the space is finite. That is, if the 
number of terminal decisions which may be made is finite, the assump- 
tion is satisfied. This will cover most cases. However, if is not 
finite, the assumption can generally be satisfied by restricting the 
range of the unknown parameter to a bounded space. This restric- 
tion appears to present no practical difficulty. 

e. Assumption 5 ; The cost function , C (x, s) , satisfies the 
following three conditions; 

(1) C (x,s) >. 0 for all x and s , and C (x;s^, . , . • 

C (x;s^ , . . • . • 

(2) For any given s , the cost , C (x, s) , is either a 
bounded function of x or G (x, s| * 00 identically in x , 

(3) There exists a sequence, [ c^ ] , (m = 1,2, . . , , ad. inf.) 

of positive values such that 

liim c = OO , and 
m 

m = 00 

C (x, s) > c^ for all x , and for all s = [ s^ . . . • Sj^] 
for which the set theoretical sum of Sj Sj^ contains 



at least m elements 



The meaning of this assumption concerning the cost of experimentation 
is given in words as follows: 

(1) The cost of experimentation cannot be negative, and the total 
cost of experimentation after an additional stage is taken can- 
not be less than it was before. 

(2) The cost of experimentation is either finite or it is impossible 
to make observations of certain variables. 

(3) Regardless of the values of the observations made or the 
number of stages employed in making them, if the total 
number of observations is at least m , then the cost, 

th 

C (x, s) , of these observations is not less than the m*^ 
term in some increasing sequence, c^ , which approaches 
infinity as a limit. The basic idea of this is that there 
exists some minimum value of the cost of observing m 
variables beyond which it is impossible to reduce the cost 
of observing m variables by rearranging the composi- 
tion of the stages of experimentation. In other words, 
it is not possible to observe more variables for less 

• I 

money by taking the stages wholesale. 

2. An Assumption Concerning the Space . 

An assumption concerning the space ©0 of admissible decision 
functions is made in addition to the assumptions concerning each da- 
tum. The most essential portion of the assumption is that only those 
decision functions which prescribe a finite amount of experimentation 
and which lead to a terminal decision are to be considered. 

3. Some Consequences of the Assumptions^ 

Regardless of how slight the cost of experimentation, if one ex- 
perimented an infinite amount: the cost would increase without bound. 
Therefore, there exists a point beyond which further experimentation 
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is not profitable. This intuitive notion is developed rigorously by Wald 
when he shows that, even though we lim.it ourselves to decision func- 
tions which prescribe a finite amount of experimentation, we can still 
approach an optimum solution arbitrarily closely under the assumptions 
of this chapter. 

Subject to the assumptions of this chapter, a Bayes solution 
exists for any given a priori distribution, ? . If it is not practica- 
ble to specify an a priori distribution, then the decision problem may 
be viewed as a zero-sum, two person game in the sense of von Neu- 
mann's theory of games, and a minimax solution exists, A minimax 
solution is a Bayes solution relative to the least favorable a priori 
distribution. The minimax solution is discussed further in Chapter 
V , 
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CHAPTER IV 



THE BAYES SOLUTION FOR A SPECIAL CASE 

1. General. 

The general formulation of the Bayes solution to the statistical 
decision problem was given in Chapter II, and some of the theory un- 
derlying its development was pointed out in Chapter III, In this chap- 
ter, we shall undertake a progressive restriction of the general problem 
until, ultimate|y, we arrive at the special case illustrated by Exhibit A. 
Thereupon, the detailed solution of Exhibit A will be indicated. The 
first step in this process will be to consider a statistical decision pro- 
blem in which the stochastic variables are restricted to be indepen- 
dently and identically distributed, and the cost of experimentation to 
be proportional to the number of observations. Then we shall proceed 
to the case where the stochastic variables are further restricted to 
take only two values. The discussion of the latter will terminate with 
the solution of Exhibit A. 

2, Independently and Identically Distributed Stochastic Variables 
with Simple Cost. 

Recalling that the object of statistical decision theory is to 
find the "best" decision function, we may readily see how the restric- 
tions we are imposing will help us. By restricting the cost function 
to be simple, i. e. , by requiring the cost of experimentation to be 
proportional to the number of observations, we make it possible to 
ignore the manner in which the Observations are grouped or arranged. 
That is, we may consider only those decision functions for which 
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each stage of experimentation consists of exactly one observation. Fur- 
ther, by requiring the stochastic variables X. to be independently and 
identically distributed, we eliminate the need for concern as to which 
particular stochastic variables are observed. As a consequence, we 
may limit the decision functions considered to those which not only 
prescribe a single observation per stage, but also prescribe that the 
stochastic variables will be observed in order. This is possible be- 
cause the stochastic variables, being identical, maybe ordered in 
any desired way. 

In continuing our search for a "best" decision function, we may 
now assert that, in choosing it, we need only compare the merits of 
decision functions falling into the limited category explained in the 
preceding paragraph. And since we are seeking a Bayes solution, 
the decision function we ultimately select will be the one that is 
"best" in the sense of the Bayes solution of Chapter II, The t*|ieader 
will recall that the Bayes solution is given relative to an a priori 
distribution "^(p) in _TL , and that it consists of that decision func- 
tion, , which minimizes the average risk - the ^■^rage being 

taken with respect to ? and the minimum over all cf . With these 
facts in mind, we may proceed with the process of comparing the 
average risk produced by each cT , and the choice of the 
which produces the least average risk. 

Let m be a non-negative integer, and let cf™ denote a deci- 
sion function which guarantees that the total number of observations 
will not exceed m . Then, for any a priori distribution 5 , we 
may define 
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to be the least average risk that can be found by considering only deci- 
sion fxinctions which gviarantee no more than m observations. 
Similarly, 



is the least average risk to be found by considering all decision func- 
tions, whether or not they prescribe a finite number of observations. 



function which is guaranteed to prescribe no observations. It is of 
interest because it enables us to write 



This is an obvious, but important relation. It says simply that the 
least average risk, if we consider only decision functions which pre- 
scribe no experimentation, is equal to the minimum cost of decision. 
This follows from the definition of risk ( cost of experimentation 
plus cost of decision ) as given in Chapter II, and the fact that no ex- 
perimentation is involved. 

Two remarks at this point may assist the reader in avoiding 
misunderstanding. First, whereas cumulative distribution functions 
{ such as 5 ) are usually employed in logical developments, the 
corresponding density functions ( such as are more often used in 
calculation. The distinction should be constantly remembered. 
Second, the present chapter requires Assumptions 1-5 of Chapter III, 
but does not require the assumption concerning - a fact the reader 
may have surmised from the introduction of §) , 



p(%) = Min r (§,<^) 



A particular decision function that belongs to both classes cT and 

, which we will be interested in, is cT^ . This is the decision 
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There are several theorems concerning the functions (^) * 

/>(5) Po® which enable us to compare various average risks 

and lead us to the Bayes solution. Perhaps the most important of these 
is the recursion formula 

A<5> 



(A) 



/"■ 



m+1 



= Min 



ao 




(f ) df (a|l) 
m ' a ' I / 



-CO 



We need to examine this formula carefully and understand it thoroughly. 
It contains several symbols not given explicitly before. They are 



a: 



f*(a|p): 

fV|5): 



stands for a value that might be obtained if a stochas- 
tic variable were to be observed. When none is ob- 
served, but advance calculations are made with the 
thought in mind that one could be, then the symbol a 
may be thought of as a stochastic variable itself. 

a cumulative distribution function for the stochastic 
variable a described above that would exist if p 
were the true parameter value of the joint cumula- 
tive distribution function F (x) . 

the expected cumulative distribution function of a ob- 
tained by calculating the expected value of f*(a|p) . 
That is, f*(a|p) is weighted by the a priori knowledge, 
of the distribution of p in-0.to obtain the average. 



c: the cost of one observation 




the a posteriori cumulative distribution function 
of P in _TL based upon the observation a . 

If % is an a priori distribution and a is the 
result of a single observation, then is an 

a posteriori distribution obtained by applying 
Bayes theorem (Appendix A) to modify ^ to 
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^ ^ ~ the modification being based upon the observation, a 
Combining these notions, it is possible to paraphrase the recursion 



the minimum of; 

(1) the least average risk produced by- 
decision functions which prescribe no 
observation 

(2) the cost of one observation plus the 
expected value of the least average 
risk produced by decision functions 
which prescribe from 0 to m 
observations after the first one 

This formula seems reasonable and its validity may be shown under 

the assumptions of Chapter III, If we want to know the least average 

risk to be had by allowing decision functions prescribing from 0 to 

m + 1 observations, we can surely get at it by breaking the decision 

functions we are allowing into two groups and picking the minimum 

one of the two least average risks attainable from these two groups. 

If the breakdown is made into (1) decision functions prescribing no 

observation and (2) decision functions prescribing from 1 to m + 1 

observations, we are set up to select the minimum as indicated in 

the recursion formula. The least average risk attainable from the 

first group is simply (S) , as previously defined. The least 

average risk attainable from the second group is more complicated. 

Since this group prescribes from 1 to m + 1 observations, we are 

certain to take at least one observation. This accounts for the c 

in the formula. After this one certain observation is taken, its value 

being a , it is possible to modify the a priori distribution ^(p) in 



formula as follows: 

the least average risk = 
produced by decision 
functions which pre- 
scribe from 0 to 
m + 1 observations 
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to an a posteriori distribution ^ (p) in -TL by the Bayes theorem of 

a 

Appendix A. At this point we would want to proceed by using the a 

* 

posteriori distribution ^ , since it is an improvement over the a 

priori distribution. To do so. we would calculate the least average 
risk produced by decision functions prescribing from 0 to m more 
observations, that is, ^ ^a ^ previously defined. This would 

give us an expression. 

c + P (f ) 
j m '^a 

for the least average risk attainable from our second group of deci- 
sion functions. The reasoning thus far has omitted one subtle, but 
key point. It is that the single observation a is never actually 
taken. Therefore, we must consider all possible values that a 
might take in a future observation. To do this we must regard the 
value a as a stochastic variable, and compute an expected value 
of f^Txi respect to the distribution of a . This accounts 

for the fact that the second choice on the right side of the formula 
takes the form 



c + 




{ ?_) df* (a/|5) . 

d 'i 



Wald has shown that will, for a sufficiently large 

value of m ^ differ from ^ {%) by an arbitrarily small amount. 
This permits us to write 



(B) 



lim 
m= 00 



and leads us from formula (A) to the formula 
(C) = Min 



A 



' 'If 



( \) df (a If ) 
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This formula is presented in the notation of the Stieltjes Integral (see 
Appendix A)j and does not distinguish between the case where the sto- 
chastic variable a is discrete and the case where it is eontinuous. 

If we desired to do so we could write 



(Cj) 






( %) = Min 



where f 



♦ ^ 






(%) 









,* / 



yO(i) = Min 



*/ 



(?Jf (all) da 



is the bar graph of a discrete stochastic variable, and 

?o 

I — J-tD 

where f ' is the density function of a contin|ious stochastic variable. 

The payoff of the preceding discussion lies in the manner in which 

^(1) and (1) maybe used to obtain a Bayes solution. It is best 

explained by Wald when he says: 

A Bayes solution relative to a given a priori probability 
measure can inimediately be given in terms of the 

fulictions yO(%) and (?) as follows: If ^(§q) - 
/O ( E ) , do not take any observation and make a 

' ® ® f ♦ 

final decision d for which W (^ ,d ) - O (^ ) , K 
o '^o o ' fo '^o' 

P <^o> < Po (1^) , take an observation on Xj and 

compute the a posteriori probability measure corres- 
ponding to and Xj . If p (^X,) , stop 

experimentation and make a final decision d^ for which 
W (5x,.d^ ) = p^ (lx,) . If P (Ix,) < p^ (tx,) , take 
an observation ^2 * general, after the obser- 

vations Xj^ , , . . 
ditional observation if p{%%i> . 

P^ (5x/* ...» ) , and stop experimentation with 

a proper terminal decision !|pEt yO(§x,, . . . , x^ ) = 

Pq (§X|. . . . . X^ ) , where . . . x^ denotes 

the a posteriori probability measure corresponding to 



X have been made, take an ad- 
m _ ^ 

X ) < 

m 
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3, Stochastic Variables Limited to Two Values. 

The case where the X. are restricted to take only two values is 
quite special. It will arise when the value of each variable may be con- 
sidered to be a failure or a success, as in Exhibit A. In such cases, 
the values of the stochastic variables are taken as 0 and 1 . These 
correspond respectively to failure and success. The following short- 
hand notation is used to describe cumulative distribution functions, 

: an a priori cumulative distribution function of p in -O- 

; the a posteriori distribution of p in after i O's 

and j 1*8 have been observed, ^oo is the same as , 



I£ there exists a positive integer m such that 

yO ( ^ . ) < c and /O (^- ) < c for i = 1 , 2. , , m; 

r^o ' mj — /^o '^im ' — 

j = 1, 2. . . m , 

then it is clear from formula (C) that 

P(f . ) = P (§ . ) and P(f. ) = P (f. ) for 

i “ 1 » 2, , . m. 
j = 1 , 2. . . m . 

This maybe explained in words as follows: Suppose an integer m 
exists such that when either m O's or m I's have been observed, 
and the attendant a posteriori distributions computed, it is found 
that the least average risk attainable, by allowing, from this point 
on, decision functions which prescribe no experimentation.does not 
exceed c . Then from formula (C) the least average risk attain- 
able by allowing decision functions prescribing any amount of exper- 



Wald uses the term probability measure where we have been using 

cumulative distribution function, , 

3o 



imentation is equal to that which is attainable by allowing only decision 
functions which call for no further experimentation. 



Let us now define p.j to be the probability of obtaining the value 
1 on a single observation when is the a priori distribution. 

That is , 









is the f of formula (Cj) , 

Then the probability of obtaining the value 0 on a single trial is 
1 - p.j , Using this notation, the formula (Cj^) of the preceding sec- 
tion may be adapted to the case where the stochastic variables take 
only two values. It becomes 



(D) 



yO(f..) = Min 






It is this particular form of the formula, along with the defining 
relation 

(E) )=Mj{i W (f..,dS = Mi^Jw (p,d^) f (p) dp 



given earlier and the Bayes theorem of Appendix A that we shall use 
in solving Exhibit A, The details of their use are best seen by study- 
ing the detailed solution of the problem. 



4, The Solution of Exhibit A, 

The dollar values given in the original presentation of Exhibit A 
in Chapter I may be multiplied by 10 ^ without altering the procedure 
followed in solving the problem. This amounts to expressing all costs 
in millions of dollars. Making this simple transformation and convert- 
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ing each original "precept" of the problem into a technical datum, as 
subsequently introduced, we have given; 

(1) the stochastic process; = 0 (failure) or = 1 (success) 
1 - 



l-f> 



I 



6 
/ ■ 


» 




? 

* » 






!-f ■ 






1 ' i 


0 


i 



Distribution of X 
Figure 'l|. 

(2) the a priori distributipn in the parameter space; 

ftf) ^(-f) 

I ' 





O 1 

Distribution of P 
Figure 9. 

(3) the decision space; D^ consists of two elements; 

d j ; accept the midget submarine 
d^ : reject the midget submarine 

(4) the weight function; 

W (p,dj ) = 0 for P > 

1 for p < 

W (p.d* ) = 0 for p < I 
= 1 for p > I 
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■f 



ns 1,00 



1.00 I o 

Weight Function 
Figure 10. 

(5) the cost function: G » .004, the cost of a single experiment. 
The Bayes solution to this problem, that is, the <S^ that we 
seek, is a table. It is the same table that was given in Chapter I, 
The upper entries in the cells of the table are values pf ) , 

while the lower entries are values of ^ • Hence the table 

provides the comparison of ) and ) needed to deter- 

mine how to experiment and reach a terminal decision. Our imme- 
diate task is to calculate these values of /O (?. . ) and /0(§. . ) to 

/ 0 ' ij 

complete the table. We may begin by calculating the values of 
(^.j ) for successive diagonal entries (i=j) from formula (E) 
and Figures 9 and 10. 



■f 



•o l5oo ' 






= Jw (p,dj ) dp = J 



yt 1 

= Pjo = 4 



yi- 

(1) (1) dp + l(0)(l)dp 



ii> 



f / 

W(?^o*4^ = J^W (p.d*2 ) §(p) d^ = J(0)(l)4 + J( 



V 1 

= = 4 






o (5oc ' 



= Mi^ w ( , d* ) = Min 
d 



4 

1_ 

4 






= X = . 2500 

4 
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The remaining diagonal entries of (5.^ ) are computed using the 
Bayes theorem (Appendix A, Case II) as well as formula (E) and Fig- 
ures 9 and 10 . 



A<^u> = 



’ll 



(1)(1-P)(P) 



_ ■ P - P 

/^l)(l-p)(p)dp £ £^1' 

2 • 3 

(^ll-dj ) = j (l)(6p - 6p^ ) dp + J^(0)(6p - 6p^dp 



= 6 p - 6 p^ 



W 



= 3p^ - Zp^J 



5 

3T 



w 



t 2 r' 2 

) = j (0)(6p - 6p ) dp + J (l)(6p - 6p ) dp 

J3U 



= 3 p 



5 

IT 



/o^^ll^ = Min W(?jj.dM = Min 



5 

31" 

5 

~TT 



31" 



= .1563 



The procedure may be generalized for all diagonal entries (i = j) 
so that we have 



P (f .. ) = 

/ O ' 11 ' 



/♦ fh . . 

(1 - p)^p)^ dp ^ /o (1 - 

^(1 - P)^(P)^ dp { ~ 



for all i . Values of this last expression may be obtained from 

4 

Tables of the Incomplete Beta Function . The use of these tables 
permits easy evaluation. Values obtained are the entries shown in 
the upper halves of the diagonal cells in Table 3. 

The next step is to calculate the non-diagonal {i^i p upper entries. 
This is ^one as follows: 

/o ^^23 ^ * 



Tables of the Incomplete Beta Function, Pearson, University Press, 
Cambridge. 1934 . 
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Jo 



L4 5 • 6 J, 



W (^3 . ) = f (l)[60(p^ - 2p^ + p^ )] dp = [l5p^ - 24p^ + lOp^J 



V+ 

o 



= • 0176 



w (f^ ,d^ ) ( (l)[60(p^ - 2p^ + p^ ) ]dp = fl5p^ - 24p^ + lOpH 



(^91 ) " Min 



'3/f 

= .1700 

. 0376 



/o ^^23 



. 1700 



:]■■ 



0376 



Again the procedure generalizes and we have, for i < j ^ 

U - p)^ (p)'^ dp 






o ^^ij ^ "" 



(1 - P)^ (p)^ dp 



For i > j we have 



// 

(1 - p)^ (p)'^ dp 



As before, the evaluation may be accomplished by use of the Tables of 
the Incomplete Beta Function. Note that (^.j ) = ) . 

This makes it necessary to evaluate entries on only one side of the 
main diagonal^ since the remaining entries may be determined by 
symmetry. 

With the upper entries filled in, we turn our attention to the lower 
entries. They maybe determined in two stages. The first stage is t 
just- t'o' compare yO^ ^^ij ^ with c for each square. Since 



Min 



(iD)yO(f.j) = 



po 'Sii > 

we may immediately select (?.j ) as the value of ^ 
squares in which jO^ ) < c . For those squares in whichyO^ ) 
> c we must use formula (D) and the formula for p.^ to calculate 

. ) . For example, in the case of diagonal entries where p.. = , 

we may compute 



y *^(?99 ) = Min 



po ^^99 ^ 

c + P99/®(^9,10^ ^ /^^^10,9 ^ 



Min 



. 0090 

.004 + j (. 0039) + j (. 0039) 



= Min 



. 0090 

.0079 



= . 0079 



In the case of non-diagonal entries, the first step is to compute p. . 

^ J 

from the formula 

Pij = (1 Ip) df.j = ^ (1 Ip) (p) dp . 



Upon substituting in this formula we have 



ij 




(1-p)' (P)^ 



dp 






dp 



(l-p)Np)^ dp f (l-p)Mp)^dp 

Jo 



This last expression may be evaluated using the Tables of the Incom- 
plete Beta Function. Once p.^ is known, we have only to solve formu- 
la (D) for ) • VaUtes of ^ (^.j ) obtained in this way complete 

Table 3. 
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The best sequence for calculating the lower entries is as follows: 
Fill in the main diagonal entry in the lower right hand corner first.’ . 
Then progress to the left in that row. Next, move up to the next higher 
diagonal entry and again work left on the row. The entries on the upper 
right hand side of the diagonal can be filled in by symmetry. 

The interpretation of the table, as given in Chapter I, may now 
be stated in terms of the technical notation. Begin taking observa- 
tions and after each observation compare ) with (1^^^ ) . 

As long as ^ (§.j ) is less than ) , continue taking observa- 
tions. When an observation is made such that ) = fio <5ij 1 • 

stop experimentation and mal&e a proper terminal decision. If i > j , 
the terminal decision will be to reject t^e midget submarine. If 
i < j j the terminal decision will be to accept the midget submarine. 
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Solution of Exhibit A in Technical Form 



CHAPTER V 



THE MINIMAX SOLUTION 



I. The Minimax Solution and its Relation to the Bayes Solution. 

The scope of this paper, for detailed discussion, is limited to the 
Bayes solution of the statistical decision problem. Emphasis is given 
to the special case in which the are independently and identically 
distributed, and confined to take only two values. However, to avoid 
having the reader assume that this constitutes all of statistical de- 
cision theory, mention should be made of the Minimax solution. 

It was pointed out in Chapter II that the Bayes solution is 
always given relative to an a priori distribution of the unknown para- 
meter. If such a distribution cannot be given, it may still be possi- 
ble to solve a statistical decision problem. A solution may be ob- 
tained by viewing the decision problem as a zero sum. two person 
game, and solving the game. A solution obtained in this manner is 
termed a Minimax solution. A Minimax solution may also be obtain- 
ed in other ways. A Minimax solution, as noted in Chapter 111, is a 
particular Bayes solution. Specifically, it is that Bayes solution 
which is given relative to the least favorable a priori distribution of 
the unknown parameter. 

The difference between the Bayes solution and the Minimax 
solution lies in the choice of a yardstick for comparing the relative 
merits of the various decision functions. The basic criterion, in 
either case , is the risk function. But the modification of this cri- 
terion,to arrive at the final yardstick, is different. The reader will 
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recall from Chapter II that, for a Bayes solution, the risk function, 
r (p•<^) » was modified to an expected risk, r (^,c^) , by averaging out 
the p , and this expected risk constituted the final yardstick. The 
modification was accomplished by using the a priori distribution, %(p). 
The expected risk, which could then be ordered as to magnitude where 
the magnitude depended on eS alone, permitted the selection of the 
particular statistical decision function, cT^ , that provided the least 
expected risk and hence the optimum solution relative to the assumed 
% (p) . In the case of a Minimax solution, we consider that an a 
priori distribution is not available. Hence, the procedure employed 
to modify the risk function to a suitable final yardstick must be alter- 
ed. The procedure that is used consists of taking the maximum risk 
vice the expected risk. An a priori distribution of p is not required 
to do this. We simply take the maximum value of the risk, r (p, J) , 
for each cT , by selecting the p that maximizes it. That is. 

Max risk = Max r (p,^) . 

■pe-n. 

This new function, the maximum risk, is dependent on cT alone, and 
can therefore be ordered as to magnitude with the magnitude deter- 
mined by cT . Again we select the particular cTq that minimizes 
the yardstick. That is, we take 

Min Max r (p, 6 ) for all cT. 

This is sometimes written 

Max r (p, ) < Max r (p, cT) for all cf • 

-P ° -P 

The oT of this latter expression constitutes a Minimax solution. 

The statement that the Minimax solution is a Bayes solution 
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relative to the least favorable a priori distribution now seems reasonable. 
For, if our a priori distribution for a Bayes solution were the least favor- 
able of all, it would lead us to the r (p, J) as a yardstick. 

The procedure used to arrive at a Minimax solution is summarized 
in Figure 11 (see following page) . 

2. Relation to the Theory of Games. 

The reader familiar with von Neumann* s theory of games will reiqog- 
nize the procedure of the preceding section as essentially the same as 
that of game theory. In fact, Wald points out the detailed correspon- 
dence between the statistical decision problem in which no a priori dis- 
tribution is given and the zero sum, two person game. In the general 
case, the corresponding game is a continuous one. This means that the 
question of the strict determinateness of the game must be investigated. 
Whereas the fundamental theorem of rectangular games assures the ex- 
istence of a solution to any finite game, no such assurance exists in the 
case of all infinite games. However, Wald demonstrates (hat, under 
suitable assumptions, any statistical decision problem viewed as a con- 
tinuous game may be approximated arbitrarily closely by a finite game. 
This means that, even if the continuous game is not strictly determined, 
no practical limitation is imposed. The detailed procedure employed in 
arriving at a Minimax solution of a statistical decision problem in this 
manner involves the formulation of the problem as a game, and the 
solution of the game. It will not be covered here. 

3. Summary. 

When an operations analyst is confronted with the need to make a 
decision on the basis of the results of conducted trials, the accuracy 
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Block Diagram of the Buildup to a Minimax Solution 

Figure 11 . 
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of which depends upon the true value of an unknown parameter, and the 
cost of the experimentation required to estimate the value of the para- 
meter is significant, a statistical decision problem is indicated. As 
Wald puts it, in two sentences here taken out of context, 

A statistical decision problem is formulated with reference 
to a stochastic process ... A statistical decision problem 
with reference to a stochastic process X arises only when 
the distribution F (x) is not completely known. 

Once a statistical decision problem has arisen, it must be possible to 
specify the stochastic process, the parameter space, the space of 
terminal decisions , the weight function and the cost function, in order 
to solve it, A solution consists of determining the particular statis- 
tical decision function that prescribes the optimum plan for conduct- 
ing experimentation and reaching a terminal decision. 

The procedure employed to reach a solution involves the use of 
a risk function as a basic criterion for selection the optimum decision 
function. This risk function takes account of both the cost of wrong 
decision and the cost of experimentation. If an a priori distribution 
of the unknown parameter can be given, the final yardstick for select- 
ing the optimum decision function is the average risk; if not, the 
final yardstick is the maximum risk. In either case, the yardstick 
is ordered as to magnitude, and that decision function which pro- 
vides the least value of the yardstick is selected as a solution. 

The first case yields a Bayes solution: the second a Minimax solu- 
tion. 

The consequence of the final decision is probabilistic. This 
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of which depends upon the true value of an unknown parameter, and the 
cost of the experimentation required to estimate the value of the para- 
meter is significant, a statistical decision problem is indicated. As 
Wald puts it, in two sentences here taken out of context, 

I 

A statistical decision problem is formulated with reference 
to a stochastic process , , , A statistical decision problem 
with reference to a stochastic process X arises only when 
the distribution F (x) is not completely known. 

Once a statistical decision problem has arisen, it must be possible to 
specify the stochastic process, the parameter space, the space of 
terminal decisions, the weight function and the cost function, in order 
to solve it. A solution consists of determining the particular statis- 
tical decision function that prescribes the optimum plan for conduct- 
ing experimentation and reaching a terminal decision. 

The procedure employed to reach a solution involves the use of 
a risk function as a basic criterion for selection the optimum decision 
function. This risk function takes account of both the cost of wrong 
decision and the cost of experimentation. If an a priori distribution 
of the unknown parameter can be given, the final yardstick for select- 
ing the optimum decision function is the average risk; if not, the 
final yardstick is the maximum risk. In either case, the yardstick 
is ordered as to magnitude, and that decision function which pro- 
vides the least value of the yardstick is selected as a solution. 

The first case yields a Bayes solution; the second a Minimax solu- 
tion. 

The consequence of the final decision is probabilistic. This 



49 



means that the final decision may, in a particular instance, conceivably 
be a poor one. Nonetheless, the theory offers a rational approach to 
the type of problem it fits, and is superior to any other known approach. 
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APPENDIX A 



SOME SELECTED MATHEMATICAL CONCEPTS 



1 . Probability, 

Probability is a quantitative measure of the likelihood of the 
occurrence of events. It is expressed by assigning a number, in the 
range (0, 1) to any specific event. For example, if an event is 
certain to’occur .it has probability 1 ; if it is certain not to occur 
it has probability 0 . If an event has a fifty-fifty chance of occurlfaftig 
it has probability ^ , The probability of an event may be estimated 
by conducting repeated trials and employing the formula 



probability 



number of successes 
number of trials 



2. Stochastic Variables. 

A stochastic variable may be defined to be a function which asso- 

» 

ciates a real number with every possible outcome of an experiment. 

The outcome of any particular performance of the experiment is said 
to be a value assumed by the stochastic variable, it being understood 
that this outcome is a chance occurrence, A stochastic variable is 
termed discrete if the number of distinct values which it may assume 
is either finite or may be arranged in a sequence (i. e. , is denumerable). 

It is termed continuous if its possible values may be represented by an 
interval on the real line, e. g. , all the points x such that a < x < b 
or - OO < X < CO. 

3. The Distribution of a Discrete Stochastic Variable. 

The correspondence between the values of a discrete stocha.stic var- 



iable 
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I 



I 



and the probabilities that it will take on these values may be described 
either by a probability function (bar graph) or by a cumulative probability 
distribution function (step function). As an example of this, consider a 
single true die to be tossed a large number of times. A mathematical 
description of the stochastic nature of this experiment maybe formu- 
lated as follows: 

, X: jSi stochastic variable representing the value shown 

on the die after any throw. 

X.: real values which may be assumed by the stochastic 

variable X, i. e. , 1,2, 3,4, 5, and 6. 

G(x): the probability that X will take on a value less than 

or equal to x . G (x) - Pr (X < x) . 

g(x): the probability that X will take on the value x , 

g(x) = Pr (X = x). 

These quantities may be displayed as follows: 





Distribution of A Discrete Stochastic Variable 
Figure 12. 

The bar graph indicates that the probability of tossing any particular 
number on a given throw is the same for all numbers, and is equal to 
. The step function is an alternative way of presenting essentially - 
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the same information. It permits the probability that a toss will show a 
value less than or equal to any given value to be read directly. For in- 
stance, the probability that the die will show three of less on a ttirow is 

G(3) = I = I . 

a result that would be anticipated. It is to be noted that 

^g(x. ) = 1 and g(x. ) ^ 0 i = 1,2, 3,4, 5, 6. 

J.=-i ^ ^ 

Also, 

G (0) = 0 and G (6) = 1 . 

These are fundamental relations associated with the probability func- 
tion and cumulative probability distribution function of the stochastic 
variable X . 

4. The Distribution of a Continuous Stochastic Variable. 

The correspondence between the values of a continuous stochas- 
tic variable and the probabilities that it will take on these values may 
be described either by a probability density f unction or by a cumula - 
tive probability distribution function . As an example of this, consider 
a line six units long on which a point is to be chosen at random. This 
is an experiment similar to the one used to describe the distribution 
of a discrete stochastic variable, but now the value of the stochastic 
vacriable may be any number in the closed interval [0,6 ] . A mathe- 
matical description of the stochastic nature of this experiment may be 
formulated as follows: 

X: a stochastic variable representing the coordinate 

point selected on any try 

x: real values which the stochastic variable X may assume. 
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G (x): 



the probability that X will taJse on a value less than 
or equal to x . G (x) = Pr (X < x) , 

g (x): the probability density function of X , 

g (x) dx : the probability that X will take on a value between 

X and X + dx . g (x) dx = Pr (x < X < x + dx) . 



The probability density function and the cumulative probability distri- 
bution function associated with X may be displayed as follows: 




Probability Density Function Cumulative Distribution Function 

Distribution of a Continuous Stochastic Variable 

Figure 13. 

The particular density function of Figure 13.' is said to be uniform. 
This means that the stochastic variable is equally likely to take on 
any one of its values and accounts for the straight, horizontal line 
which represents the density function. Other stochastic variables 
may have a bias such that some of the values are more likely to occur 
than others, and will have density functions which are not represented 
by horizontal linefe. In any case, the area under the density function 
will always be 1 , and the cumulative distribution function will in- 

crease monotonically to a maximum value of 1 for increasing values 
of X . It is to be noted that 

g (x) dx = 1 and g (x) 2: 0 for 0 < x < 6 . 
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Also, 

G (0) = 0 and G (6) = 1. 

These are fundamental relations associated with the probability density 
function and the cumulative probability distribution function of the sto- 
chastic variable X . In texts on probability theory, it is shown that, 
for continuous stochastic variables, the density function, when it 
exists, is the derivative of the cumulative distribution function. 

This is a basic relation. It should be noted that, whereas every sto- 
chastic variable, X , has a cumulative distribution function G (x) ,' 
the density function 



(X) = 



exists only if G (x) is differentiable. 



5. The Expected Value of a Stochastic Variable. 

The expected value (average value) of a discrete stochastic vari- 
able is defined to be 

(5. 1) , E (X) = X = ^ X. g (x. ) . 

alU ^ ^ 

and the expected value of a continuous stochastic variable which has 
a density function is defined to be 

(5. 2) E (X) = X = ^ x,r;g(x) dx . 

a.f! X 

The expectation, E (X) , is often termed a weighted average. In the 

case of a discrete stochastic variable, the "weight” associated with 
X. is g (xj^ ) . In the case of a continuous stochastic variable, the 
"weight" associated with x is g (x) dx, i. e. , the probability that 
X lies in dx . In the latter case we say, more briefly, that x is 
weighted by g (x) , the value of the probability density function. 
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In theoretical discussions, it may not be desirable to distinguish 
between discrete and continuous stochastic variables nor to emphasize 
continuous stochastic variables which have a density function. In such 
circumstances, generally, it is only necessary to refer to the stochas- 
tic variable, say X , and its cumulative probability distribution func- 
tion, say G (x) . To represent the expected value of X, it is custom- 
ary to write 



the integral on the right represents either a Riemann-Stieltjes or a 
Liebesgue-Stieltjes integral according to the degree of generality of the 
theory of integration under consideration. In this paper, such integrals 



the integral of equation (5. 3) as a concept which includes both of the 
concepts of equations (5. 1) and (5. 2) as special cases. We should note 
that the expected value of a function of a stochastic variable may also 
be defined. For example, if h (X) is a function of the stochastic 
variable X , we may write 



As a further illustration of the notion of expectation and the use of gen- 
eral functional notation, consider the following: 

P; a stochastic variable 



(5.3) E (X) 




will be viewed as Riemann-Stieltjes integrals. In short, we may regard 




00 



E 



t>: real values that may be assumed by P 

?(p): cumulative probability distribution function of P 

§^(p): probability density function of P . 
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The expected value of P , according to equation (5. 2) , is 



E (P) = p 



J p M (p) dp . 



^// r 



This may be more conveniently expressed, using the Riemann-Stieltjes 
integral, as 



where -TL is the space of p , and the differential d^ is used instead 
/ 

of > dp. Thus, values of p are weighted by d5, where formerly 
values of p were weighted by 5 (p) dp . The symbol 



replaced by its equivalent expression in p , and the integration is 
carried out just as in elementary calculus. 



So far we have discussed only distribution functions of a single 
stochastic variable X . The notion of joint distribution functions of 
more than one stochastic variable is often employed in probability 
theory. This is nothing more than an extension of the idea of the 
distribution function of a single stochastic variable. For example, 
if Xj and are two stochastic variables, then their joint cumu- 

lative distribution function, F (x^,X 2 ) , is the probability that 
Xj^ < Xj^ and X^ < x^ simultaneously. That is 

F (xj,X 2 ) = Pr (Xj < Xj and X 2 < ) simultaneously. 





is a functional symbol used to express the notion of the weighting and 
summing. When actual computations are carried out, d 5(p) is 



6. Joint Distribution Functions. 
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The density function of such a distribution may be represented as a sur- 



face in three dimensional space as follows: 




It is to be noted that 




.x^) dx 



1 



dx- 



1 and f (xj.x^) ^0 for all Xj^.x^ 



Also, 






F (xj.x^) 

^ ^1 S ^2 ' 



These are fundamental relations associated with the joint distribution. 
In a similar manner, the analytical notion of a joint distribution func- 
tion may be extended to any number of stochastic variables, although 
the geometrical representation does not apply for more than two. 



7. Bayes Theorem. 

Perhaps the single mathematicail concept most vital to an under- 
standing of statistical decision theory is the Bayes theorem of inverse 
probability. To explain this theorem in terms of the example of 
Chapter I, let us recall that we assumed that P , the true percen- 
tage success of the midget submarine in a future war, is equally 
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likely to have any value between 0 and 100%. This is equivalent to 
assuming that the a priori distribution of the parameter is uniform and. 



at the outset, represents our best knowledge of P , As the problem 
progresses, observations are made. These observations add to our 
knowledge of P , and we therefore wish to modify the originally as- 
sumed a priori distribution of p to what we term an a posteriori dis- 
tribution of P on the basis of the observations. Bayes theorem pro- 
vides the means to do this. That is, if an a priori distribution is 
known and observations are subsequently made ,' Bayes theorem may 
be Used to modify the a priori distribution to an a posteribri distribu- 
tion on the basis of the observations. Two forms of Bayes theorem 
in its application to density functions in statistical decision theory are; 
Case I (-n.Discrete); 



discrete and a probability density function when X is continuous, 
and the integer n is the number of values P may assume in Case I. 

To examine the theorem further, let us study an example which 
finds application in this paper. Suppose 




Case 11 ( -TL Continuous) ; 






In these formulas, g (x. |j) is a probability function when X is 
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X. (i=l,2,...);: 


a collection of independently and identically distributed 
discrete stochastic variables 


(i“ 1»2,,,,); 


real values that may be assumed by each X^ . Each 
X. is confined to the two values 0 or 1 . 


G (x): 


the common cumulative probability distribution 
(step) function applicable to each of the X. . 


g (x): 


the common probability function (bar graph) appli- 
cable to each of the X. . 


P: 


a continuous stochastic variable representing the 
parameter of G (x) or g (x) . 


p: 


real values that may be assumed by P , i. e. 
0 < p < 1 . 


^(p): 


the a priori cumulative probability distribution 
function of P . 


f(p): 


the a priori probability density function of P . 


/im.^{p) • 


the a posteriori cumulative distribution function of 
P after m observations on the X. . 




the a posteriori density function of P , after m 
observations on the X. .• 



The Bayes formula for the density functions of P is, as in Case II, , 





/W 

§(p) >]Ti g (xjp) 

“* 9 

( ! 

f(p) g (xjp) dp 



where -A. is the space of P . Let us amplify this with some diagrams 
and sample computations. Suppose each X. is distributed according 
to the following diagram; 
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, Distribution of the X 

Figure 15. 

Ihe letter p stands for a value of the unknown parameter. If we 
assume the a priori density function of P to be uniform, it may be 
pictured as follows: 




f 

A Priori Density Function of P 
Figure 16. 

If we take a single observation on one of the X. with the result 
Xj = 1 , we may apply Bayes formula as follows: 



,i(p) = -y} 



(1)(P) 



= 2p. 



r (1) p dp 

Jo 

This a posteriori density function may be pictures as follows: 
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0 



1 



f 



A Posteriori Density Function of P for Xj = 1 . 
Figure 17 . 

Note that the result of the single observation, through the Bayes formu- 



favor of the value 1 . This is an intuitively reasonable result, since 
the value Xj = 1 was observed. Similarly, if the result of the 
single observation had been Xj^ = 0 , the a posteriori density function 
would have been modified from uniform to a bias in favor of the value 
0. In that case. 



la, has modified the density function of P from uniform to a bias in 



s(p) 



(1)(1-P) 



2 — 2 p , 



1 




(l)(l-p)dp 



and the a posteriori density function becomes the following: 






^ 






O 



t 



f 



A Posteriori Density Function of P for x^ = 0 
Figure 18. 
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Again, if two observations had been taJ^en with the results Xj^ = 1 and 
X 2 = 0 , then the a posteriori density function would be 



/ U)(pU1-p) dp 

-'O 



6 p - 




$ 



and is pictured as follows: 







A Posteriori Density Function of P for Xj = 1 , X 2 = 0 . 

Figure 19. 

Note that this last density function is a parabola, and has been modified 
from uniform to a bias in favor of the value . This is again an in- 
tuitively reasonable result to follow from the two observations. 
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